Skip to content

Bringup Qwen2.5-1.5B#3506

Open
ChingTsai wants to merge 1 commit intomainfrom
jimmytsai/bring-up-qwen2_5-1_5b
Open

Bringup Qwen2.5-1.5B#3506
ChingTsai wants to merge 1 commit intomainfrom
jimmytsai/bring-up-qwen2_5-1_5b

Conversation

@ChingTsai
Copy link
Copy Markdown
Collaborator

@ChingTsai ChingTsai commented Mar 26, 2026

Description

  • Bringup Qwen2.5-1.5B

FIXES: b/495594907

Tests

Maxtext -> HF

python3 -m tests.utils.forward_pass_logit_checker src/maxtext/con
figs/base.yml run_name=forward_pass_test_scanned model_name=qwen2.5-1.5b tokenizer_path=Qwen/Qwen2.5-1.5B-Instruct load_parameters_path=XXXX max_prefill_predict_length=4 max_target_length=8 dataset_type=synthetic scan_layers=true per_device_batch_size=1 skip_jax_distributed_system=True dtype=float32 --max_kl_div=0.015  --run_hf_model=True 
--hf_model_path=Qwen/Qwen2.5-1.5B-Instruct

Scanned
Unscanned

HF -> Maxtext


python3 -m tests.utils.hf_checkpoint_conversion_checker --original_ckpt=hf_cache/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306 --converted_ckpt=qwen2.5-1.5b/hf_from_scanned

Log

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@ChingTsai ChingTsai force-pushed the jimmytsai/bring-up-qwen2_5-1_5b branch from b4688d0 to e35f17f Compare March 26, 2026 02:30
@ChingTsai ChingTsai changed the title bringup qwen2.5-1.5B Bringup qwen2.5-1.5B Mar 26, 2026
@github-actions
Copy link
Copy Markdown

🤖 Hi @ChingTsai, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

@github-actions
Copy link
Copy Markdown

🤖 I'm sorry @ChingTsai, but I was unable to process your request. Please see the logs for more details.

@github-actions
Copy link
Copy Markdown

🤖 I'm sorry @ChingTsai, but I was unable to process your request. Please see the logs for more details.

Copy link
Copy Markdown
Collaborator

@RissyRan RissyRan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@ChingTsai ChingTsai force-pushed the jimmytsai/bring-up-qwen2_5-1_5b branch from e35f17f to 387df2d Compare March 27, 2026 08:12
@shuningjin
Copy link
Copy Markdown
Collaborator

shuningjin commented Mar 29, 2026

I noticed that we now have three scripts to verify to_huggingface.
(1) tests.utils.forward_pass_logit_checker. This is what we initially have and commonly use.
(2) maxtext.checkpoint_conversion.compare_hf_ckpt. Introduced by PR 2903.
(3) tests.utils.hf_checkpoint_conversion_checker. Introduced by PR 3113.

@RissyRan: Could we have a unified test process for to_huggingface, as a follow-up?

  • For to_maxtext, we always use (1) as a test. convert HF1 -(to_maxtext)-> maxtext, compare HF1 and maxtext via logit check.
  • Similarly, for to_huggingface, we can also use (1). convert maxtext -(to_huggingface)-> HF2, compare HF2 and maxtext via logit check with --hf_model_path=$HF2.
  • Meanwhile, it seems (2) and (3) are performing the same task of comparing HF1 and HF2.
  • I would recommend using (1) for to_huggingface, to align with to_maxtext.

Example worflow

using qwen3-0.6b for demonstration

to_maxtext

# to_maxtext: HF1 -> maxtext
# HF1: Qwen/Qwen3-0.6B
# maxtext: gs://runner-maxtext-logs/to_maxtext_20260329/0/items

BASE_OUTPUT_PATH=gs://runner-maxtext-logs/to_maxtext_20260329
python3 -m maxtext.checkpoint_conversion.to_maxtext \
src/maxtext/configs/base.yml model_name=qwen3-0.6b scan_layers=true \
base_output_directory=$BASE_OUTPUT_PATH hf_access_token=$HF_TOKEN \
hardware=cpu skip_jax_distributed_system=True \
attention=dot_product \
--eager_load_method=transformers --save_dtype=bfloat16

log: https://paste.googleplex.com/4580852571963392

# test to_maxtext: compare HF1 and maxtext

SCANNED_CKPT_PATH=gs://runner-maxtext-logs/to_maxtext_20260329/0/items
python3 -m tests.utils.forward_pass_logit_checker src/maxtext/configs/base.yml base_output_directory=gs://runner-maxtext-logs run_name=forward_logits_check load_parameters_path=${SCANNED_CKPT_PATH} scan_layers=true attention=dot_product per_device_batch_size=1 model_name=qwen3-0.6b max_prefill_predict_length=4 max_target_length=4 async_checkpointing=false sparse_matmul=false ici_fsdp_parallelism=1 ici_expert_parallelism=1 checkpoint_storage_concurrent_gb=1024 weight_dtype=float32 dtype=float32 activations_in_float32=true matmul_precision=highest float32_logits=true float32_qk_product=true --max_kl_div=3e-4 \
hardware=cpu skip_jax_distributed_system=True \
--run_hf_model=true --hf_model_path=Qwen/Qwen3-0.6B tokenizer_path=Qwen/Qwen3-0.6B tokenizer_type=huggingface

log: https://paste.googleplex.com/5375600702390272

to_huggingface

# to_huggingface: maxtext -> HF2
# maxtext: gs://runner-maxtext-logs/to_maxtext_20260329/0/items
# HF2: /tmp/qwen3_20260329

SCANNED_CKPT_PATH=gs://runner-maxtext-logs/to_maxtext_20260329/0/items
HF_PATH=/tmp/qwen3_hf_20260329 
python3 -m maxtext.checkpoint_conversion.to_huggingface \
src/maxtext/configs/base.yml \
model_name=qwen3-0.6b \
scan_layers=true load_parameters_path=$SCANNED_CKPT_PATH \
base_output_directory=$HF_PATH \
weight_dtype=bfloat16 \
skip_jax_distributed_system=true attention=dot_product

log: https://paste.googleplex.com/4844209799561216

# test to_huggingface: compare HF2 and maxtext

SCANNED_CKPT_PATH=gs://runner-maxtext-logs/to_maxtext_20260329/0/items
HF_PATH=/tmp/qwen3_hf_20260329 
python3 -m tests.utils.forward_pass_logit_checker src/maxtext/configs/base.yml base_output_directory=gs://runner-maxtext-logs run_name=forward_logits_check load_parameters_path=${SCANNED_CKPT_PATH} scan_layers=true attention=dot_product per_device_batch_size=1 model_name=qwen3-0.6b max_prefill_predict_length=4 max_target_length=4 async_checkpointing=false sparse_matmul=false ici_fsdp_parallelism=1 ici_expert_parallelism=1 checkpoint_storage_concurrent_gb=1024 weight_dtype=float32 dtype=float32 activations_in_float32=true matmul_precision=highest float32_logits=true float32_qk_product=true --max_kl_div=3e-4 \
hardware=cpu skip_jax_distributed_system=True \
--run_hf_model=true --hf_model_path=$HF_PATH tokenizer_path=Qwen/Qwen3-0.6B tokenizer_type=huggingface

log: https://paste.googleplex.com/5559004127428608

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants